Model Selection
The way we always use is Subset Selection. We identify a subset of the predictors that we believe to be related to the response.
Best Subset Selection (totally compare models):
- let be the null model (no predictors which is the sample mean)
- For (here the states the max features we have):
- Fit all models, then we have models
- Pick the one with smallest RSS/ largest , fix that one as true
Stepwise Subset Selection:
Forward Stepwise (totally compare models):
- let be the null model
- For
- Consider all models that augment the predictors in with one additional predictor
- Pick the one with smallest RSS/ largest , fix that one as
Backward Stepwise (totally compare models):
- let be the full model
- For
- Consider all models that contain all but one of the predictors in for total predictors.
- Pick the one with smallest RSS/ largest , fix that one as
After subset selection, we have many training model. If we have such test set, we can do test MSE for each model and select the one with low test error. If not, we can do the method without test data where using cross-validated prediction error, , , , or adjusted
Cons and Pros
Best Subset Selection:
- can be used for other types of models (i.e. logistic regression)
- can't be used for
- much computation
Forward Stepwise Selection:
- less computation
- can be used in high-dimensional setting with
- may not find the best possible since greedy
Backward Stepwise Selection:
- less computation
- only used in
- may not find the best possible since greedy
Gradient Descent
We have gradient descent for